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Abstract 

Our goal is to extend information geometry to situations where statis¬ 
tical modeling is not obvious. The setting is that of modeling experimental 
data. Quite often the data are not of a statistical nature. Sometimes also 
the model is not a statistical manifold. An example of the former is the 
description of the Bose gas in the grand canonical ensemble. An example 
of the latter is the modeling of quantum systems with density matrices. 
Conditional expectations in the quantum context are reviewed. The bor¬ 
der problem is discussed: through conditioning the model point shifts to 
the border of the differentiable manifold. 


1 Introduction 

One of the goals of information geometry |pQ is the study of the geometry of 
a statistical manifold M. A tool suited for this study is the divergence func¬ 
tion D(p\\q ), called relative entropy in the physics literature. It compares two 
probability distributions p and q. It cannot be negative and vanishes if and 
only if p = q. In our recent works [21 El S] we have stressed the importance of 
considering the statistical manifold M as embedded in the set of all probability 
distributions. In particular, the divergence function D(p\\q), with p not belong¬ 
ing to M, can be used to characterize exponential families. We also stressed 
that it is not a strict necessity that the first argument of the divergence is a 
probability distribution. The first example given in this paper is an illustration 
of this point. 

In quantum probability 00 both arguments of the divergence function are 
replaced by density matrices, which are the quantum analogues of probability 
distributions. A renewed interest in quantum probability comes from quan¬ 
tum information theory (see for instance [Q ;7J). The theoretical developments 
are accompanied by a large number of novel experiments. Some of them are 
mentioned below llOj . They confirm the validity of quantum mechanics 

but challenge our understanding of nature. The present paper tries to situate 
some recent insights in the context of quantum conditional expectations. In 
particular, weak measurement theory noun is considered. 
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2 The Ideal Bose Gas 


The result of a thought experiment on the ideal Bose gas is a sequence of non¬ 
negative integers ni, ri 2 , ■ ■ ■ with finite sum n j < +oo- A model for these 

data is a two-parameter family of probability distributions 
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The numbers are supposed to be known and to increase fast enough with the 
index j to guarantee the convergence of the infinite product. The parameters /3 
and /i are real. By assumption is /3 > 0 and p < ej for all j. 

The model space is the statistical manifold M formed by the probability 
distributions pp^- However, the data produced by the measurement are strictly 
spoken not of a stochastic origin. Indeed, a more detailed modeling of the ex¬ 
periment involves quantum mechanics and quantum measurement theory. The 
latter is much debated since the introduction of so-called weak measurements 
HJ. Hence, we cannot speculate about a possible stochastic origin of the data. 

If all we know about the experiment is that it produces the sequence n then 
the simplest modeling we can do is to fit m to the data with a method which 
can be used also outside the conventional settings of statistics. Our proposal is 
to do the fitting by minimization of a divergence function. 

The Kullback-Leibler divergence between two probability measures can be 
easily generalized to a divergence between a sequence of integers n and a point 
(/3, p) of the statistical manifold M. Our ansatz is 


D(n\\/3,fi) = lnZ(J3,p)+ ftp). (3) 
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Minimizing this divergence produces a best fit for the data. If such a best fit 
/?, p exists we say that it is the orthogonal projection of n on M. The reference 
to orthogonality is justified by the knowledge that a Pythagorean theorem holds 
for the divergence (0 - see [3]. Let us analyze in what follows the properties 
of this minimization procedure. 

Derivatives of © w.r.t. /3 and p can be calculated easily. It follows that, if 
the minimization procedure has a solution, then it satisfies the pair of equations 

j 
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nown in statistical physics, see for instance 
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A metric tensor g(/3,g) is given by the matrix of second derivatives of 
D(n\\/3, g), evaluated at the minimum. One finds 




y- exp(/?((ej - g)) ( ej - g) 2 -p(e 3 - g) \ . . 

“ [exp(^(ej - g)) - l] 2 V ~P ( e 3 - V) P 2 ) ' 


It is positive definite and does not depend on the choice of coordinates /3, g of 
the statistical manifold M, as it should be. 

The next step is the introduction of covariant derivatives V a , a = /3,g 
such that for all n the Hessian of the divergence D(n\\/3, g), evaluated at the 
projection point (@,g) of n on M, equals the Hessian of a potential <fr(/3,g). 
The corresponding connection u> satisfies 


V a d b = u c ab d c . (7) 

The existence of this connection oo shows that the Hessian V 0 Vf,IA(n||/3, g) is 
constant on the set of all n which project on the point (/3, g) and equals the 
metric tensor g. This gives the inverse of g the meaning of a Fisher information. 

A method for calculating co is given in jTj. It turns out that all coefficients 
of co vanish except = m 


3 Quantum Measurements 

The quantum analogue of a probability distribution is a density matrix. In the 
finite-dimensional case this is a positive-definite matrix whose trace equals 1. 
Its eigenvalues A ? satisfy A j > 0 and JT Xj = 1. Hence they can be interpreted 
as probabilities. 

On the other hand the state of the quantum system, in the most simple case, 
is described by a wave function if). This is a normalized element of a Hilbert 
space T~L. Let |'i/’)(V’l denote the orthogonal projection onto the subspace Cip. 
This is a density matrix of rank 1. It is generally believed that a measurement 
on the quantum system with wave function ip yields the diagonal part of the 
matrix hr a- n orthonormal basis the choice of which is dictated by the 

experimental setup. Let (e 3 )j denote this basis. Then the measured quantities 
are the numbers |(e J j'i/’)| 2 (here (•!•} is the inner product of the Hilbert space). 
These are the diagonal elements of the matrix |^>)(^>|. The diagonal part of this 
projection operator is again a density matrix, which we denote diag( |-0) • If 

is the result of the experiment. 

The map 


E : \tp)(ip\ —> diag(IV’X^I) (8) 

can be seen as a conditioning which is introduced by the experimental setup. 
Indeed, If is a quantum conditional expectation in the terminology of Petz 
(Chap. 9 of [Sj). See the Appendix [All . Petz gives an overview of quantum 
probability theory as it is known today. The part on conditional expectations 
originated with the work of Accardi and Cecchini m and relies on Tornita- 
Takesaki theory. 

We are interested in the question how the conditioning interferes with the 
modeling of experimental data using a divergence function. The quantum ana¬ 
logue of the Kullback-Leibler divergence (also called the relative entropy) has 
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density matrices as its arguments. It is given by 

D(a\\p) = Trainer — Train p. (9) 

Let X denote the set of density matrices a for which a In a is trace-class. Let 
V a denote the set of density matrices p such that IZ(a) C 7 Z(p) and alnp is a 
trace-class operator. The domain of D is then 


B = {(a, p) : a £ X,p £ V a }. (10) 

For the sake of completeness we repeat here the following well-known result 
(see Theorem 5.5 of fTKI l 

Theorem 3.1 D(a\\p) > 0, with equality if and only if a = p. 


Fix now a model manifold M. It is tradition to work with the quantum 
analogue of a Boltzmann-Gibbs distribution, which is a probability distribution 
belonging to the exponential family (see for instance mi). The parametrized 
density matrix pg € M is of the form 


with normalization 


pe = 

1 p -e k H k 

Z{0) 

( 11 ) 

m 

= Tve- ekHk . 

( 12 ) 


The operators Hk are self-adjoint. Together they form the Hamiltonian of the 
system under consideration. The parameters 9 1 1 9 2 ,---,9 n are real numbers. 
Note that Einstein’s summation convention is used. 

The estimation problem is the question about the optimal choice of the pa¬ 
rameters 9 k given the result a of the experiment. The proposal of information 
geometry is to use the divergence function (0) to calculate the orthogonal pro¬ 
jection p a of a onto the model manifold M = {pg : 9 £ 0}. The projection is 
said to be orthogonal because the following Pythagorean relation holds 

D(a\\pg) = D(<j\\p a ) + D{p a \\pg) (13) 


holds for all 9 in 0. 

Assume now that an experiment is done in the basis (if n )n dr which the 
elements of M are diagonal. Let a c = diag(a) as before. Then it follows from 
Theorem 9.3 of [6] that 

D(a\\p) = Z?(a||a c ) + D(a c \\p) for all p £ M. (14) 

Now fitting the result a c of the experiment with elements of M is equivalent 
with fitting the unknown density matrix a because the difference of the two 
divergences is constant, equal to Z?(a||a c ). 


4 Weak Measurements 

In many recent experiments the actual state of the system, which is described by 
the density matrix a, is measured in a basis ( ip n ) n in which a is far from diago¬ 
nal. Many of these experiments involve so-called quantum entangled particles. 
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They confirm [5] that the Bell inequalities, which are derived using probabilistic 
arguments (see for instance El), can be violated. 

Because one knows that the actual state a is not diagonal one tries to fit a 
model which is not diagonal as well. In such a case the above argument based 
on Cl cannot be used. Instead, the conditioning implied by the experimental 
setup should be included in the modeling of the experiment. 

Introduce a conditional manifold 

M c = {p c : pel and p > 0}, where p c = diag(p). (15) 

The relation m then shows that the optimal p c , minimizing the divergence 
D(<j c | \pc) , also minimizes D(a\\p c ). 

It can happerQ that M c is in the border region of the manifold of positive- 
definite matrices, where the value of the function p c —> D(a\\p c ) can become 
very large. This is similar to the effect exploited in weak measurements m, 
namely that the denominator of the so-called weak value can become very small. 
See the Appendix [B] This suggests that weak measurements can be understood 
by the behavior of the divergence function p c —> D(a\\p c ) in the border region. 
This idea requires further exploration. 

In the more common von Neumann type of experiments the measurement 
disturbs the quantum system in such a strong manner that the conditioning of 
the experimental setup also changes the state of the quantum system. Repeating 
the experiment then reproduces the same outcome as that of the first measure¬ 
ment. This is called the collapse of the wave function. If the outcome p c of 
the experiment is very sensitive to small changes in the state a of the quantum 
system then one can afford to make the interaction between quantum system 
and measurement apparatus so weak that repeated measurements become fea¬ 
sible. In recent experiments D3H thousands of consecutive measurements were 
feasible. They reveal a gradual change of the quantum state a of the system as 
a consequence of the measurements. 

5 Summary 

Two situations are described where a divergence function is used with arguments 
which are not probability distributions. In the example of the ideal Bose gas 
the experimental data are sequences n = (rij)j of non-negative integers. It is 
more natural to take the sequence n as the first argument of the divergence 
function rather than to introduce an empirical measure concentrating on the 
data points. In the example of quantum mechanics the arguments are density 
matrices. The use of density matrices as the arguments of the divergence has 
been studied extensively in the context of quantum probability. 

In the final part of the paper we investigate the use of divergences in the 
theory of quantum measurements. Our point of view is that any quantum mea¬ 
surement necessarily induces a quantum condition on the range of experimental 
outcomes. The mathematical notion of a quantum conditional expectation is 
used — see the Appendix [A] The recent development of weak quantum mea¬ 
surements is cast into this terminology. The distinction is made between the 

1 If M c is empty there is not much to tell. 
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conditioning of the experimental outcomes, which is unavoidable, and the con¬ 
ditioning of the actual state of the system, which is avoided by the weak mea¬ 
surements. The eventual importance of the border of the manifold of positive 
definite density matrices is pointed out. 


A Quantum Conditional Expectations 

Following Petz (see Chapter 9 of [Bj) a conditional expectation consists of a 
subalgebra A of the algebra B of bounded linear operators in the Hilbert space 
H together with a linear map E : B —>• A. They should satisfy 

• I belongs to A and £1(1) = I. 

• If A £ A then also T £ A. 

• If B is positive then also E(B) is positive. 

• E(AB) = AE(B) for all A £ A and B £ B. 

Take B = I in the latter to find that E(A) = A for all A in A. 

In the terminology of |6j a density matrix p is preserved by the conditional 
expectation A, E if 


Tr pB = Tr pE(B) 


(16) 


holds for all b in B. 

Now, let be given an orthonormal basis ( i/j n ) n of H. Then any bounded 
operator B has matrix elements (( '4>m\Bip n ))m,n ■ The diagonal part of the 
operator B is then defined by linear extension of 

diag (B)i/j n = (17) 

The map B —>• diag(S), together with the algebra of all diagonal operators is a 
conditional expectation. In addition, for any density matrix p the diagonal part 
p c = diag(p) is again a density matrix and it is preserved by this conditional 
expectation. Indeed, one has for all p and B 

Tr p c B = Tr p c diag(H). (18) 

B Weak Measurement Theory 

In the seminal paper El about weak measurements an experimental setup is 
proposed. The quantum system contains two parts. The first part is the system 
of interest. It is weakly coupled to the second part. On the latter von Neumann 
type measurements are performed to collect data. The subsequent experimental 
implementations follow the same scheme. See for instance mm- In the present 
paper only the first part of the experimental setup is considered as the quantum 
system. The remainder is then considered to be part of the measuring apparatus. 

Ref. P2] discusses the notions of pre and post selected states. The density 
matrix a = | i/j) (ijj | of the present paper describes the preselected state. It is the 
initial state of the experiment transported forward in time using the Schrodinger 
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equation. In a von Neumann type of measurement the post selected state is 
the state | ipf) (4>f\ obtained after the collapse of the wave function, transported 
backwards in time to the point where it meets the preselected state. The claim of 
m is that the result of the measurement is a so-called weak value of an operator 
C, which is the operator of the quantum system to which the measurement 
apparatus couples. This weak value is given by 


(C) 


(i’f \Cj’) 

{i’f W) 


(19) 


It can become arbitrary large by setting up the experiment in such a way that the 
overlap |(t(’/|^)| 2 of the pre and post selected states is very small. This theory 
of weak measurements has been criticized in the literature (see the references in 
P3). Additional experiments are needed for its validation. 

In the terminology of the present paper the coupling via the operator C 
induces a conditioning on the outcomes of the experiment. 
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